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Abstract 

The classical condition on the existence of uniformly exponentially con- 
sistent tests for testing the true density against the complement of its 
arbitrary neighborhood has been widely adopted in study of asymptotics 
of Bayesian nonparametric procedures. Because we follow a Bayesian 
approach, it seems to be more natural to explore alternative and appro- 
priate conditions which incorporate the prior distribution. In this paper 
we supply a new prior-dependent integration condition to establish gen- 
eral posterior convergence rate theorems for observations which may not 
be independent and identically distributed. The posterior convergence 
rates for such observations have recently studied by Ghosal and van der 
Vaart [5]. We moreover adopt the Hausdorff a-entropy given by Xing 
and Ranneby [l8][16], which is also prior-dependent and smaller than 
the widely used metric entropies. These lead to extensions of several ex- 
isting theorems. In particular, we establish a posterior convergence rate 
theorem for general Markov processes and as its application we improve 
on the currently known posterior rate of convergence for a nonlinear 
autoregressive model. 
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1 Introduction 



The aim of this article is to study the asymptotic behavior of posterior distri- 
butions based on observations which are not assumed to be independent and 
identicahy distributed. Suppose that (je("),^("),Pg^"^ : 6* e 6) , n = 1, 2, . . . , 
are statistical experiments with observations X^"'-', where the parameter set 

(n) 

Q does not depend on the index n, and suppose that the distributions Pg 

for all E admit densities p^"^ relative to a cr-finite measure /x^"-* on 
Denote by the true parameter generating the observations X^'^K Assume 
that is the infinite product measure Pg^^ pjj^^ ■ ■ ■ pjj^^ • • • on the product 
space (g)^=iX("). In the sense that each B C X*-"^ is identified with the sub- 
set (X(i), X(2), . . . , X("-i), X("+i), . . . ) of the product space, we have that 
P^ = Pg^^ holds on X^"^ for all n. In other words, P^ is the distribu- 
tion of the sequence {Xi,X2, ■ ■ ■) which makes the observations X„ indepen- 

(n) 

dent from Pg . Let dn be a semimetric on 0. Note that any semimetric 

dn{Pg^\ Pg^^) ou the space of densities defined on X*^") induces naturally a 

semimetric dn(^ii^2) = dn{Pg^\ Pg^^ ) on when the mapping 9 t-^ Pg"^^ is 
one-to-one which is assumed in the paper. Given a prior n„ on 0, the posterior 
distribution n„(- | X*^")) is a random probability measure given by 

pf\xin))Ti^{de) !^ Rf\xin))ii^{d9) 
for each measurable subset B in 0, where = 

stands for the likelihood ratio. Recall that the posterior distribution 
n„(-|X(")) is said to be convergent almost surely at a rate at least e„, if 
there exists r > such that Iin{9 £ : dn{0,eo) > re„ — > al- 
most surely as n ^ oo. Similarly, n„(-|X(")) is said to be convergent in 
probability at a rate at least £n if for any sequence tending to infinity, 
Un{9 e : dn{e,eo) > rn£n — > in probability as n ^ oo. Through- 
out this paper, almost sure convergence and convergence in probability are 
understood as to be defined with respect to -P^- 

Asymptotics of Bayesian nonparametric procedures has been the focus of a 
considerable amount of research during past three decades. Much works were 
concerned with the asymptotic behavior of posterior distributions for i.i.d. 
observations, see, for instance, Barron, Schervish and Wasserman [1], Ghosal, 
Ghosh and van der Vaart [4J, Shen and Wasserman |9j and Walker, Lijoi and 
Prunster [TJ]. Recently, Ghosal and van der Vaart [5] proved several types of 
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posterior convergence rate theorems for non-i.i.d. observations. Their results 
reply upon the existence of uniformly exponentially consistent tests, combined 
with the metric entropy condition and the prior concentration rate. Both the 
existence of uniformly exponentially consistent tests and the metric entropy 
condition depend on models, but not on priors. Since the posterior depends on 
the complexity of the model only through the prior, it is therefore of interest 
to explore alternative conditions which incorporate priors. In this paper we 
use an integration condition together with the Hausdorff a-entropy to study 
convergence rates of posteriors. The integration condition and the Hausdorff 
a-entropy both are prior-dependent. We show that the integration condition 
is weaker than the existence of uniformly exponentially consistent tests and 
holds automatically for an interesting class of metrics used to describe rates of 
convergence. The latter fact leads to an extension of the results for i.i.d. ob- 
servations in Walker [Il][Il] and Xing [16j, in which construction of such tests 
is not necessarily required in order to obtain posterior consistency. The inte- 
gration condition is moreover useful in construction of priors, as shown when 
we prove that the convergence rates of the pseudoposteriors given by Walker 
and Hjort [13] do not depend on the metric entropy condition. The Haus- 
dorff a-entropy condition was introduced in Xing and Ranneby |18j[16] and 
it is weaker than the metric entropy condition. By means of the integration 
condition and the Hausdorff a-entropy, we establish general posterior conver- 
gence rate theorems both in the almost sure sense and in the in-probability 
sense. Particularly, we obtain convergence rate theorems of pseudoposteriors 
and posteriors for independent observations. We also prove a posterior conver- 
gence rate theorem for general Markov chains, which is an extension of a result 
for stationary a-mixing Markov chains given by Ghosal and van der Vaart ([5], 
Theorem 5). As applications we improve on the posterior rate of convergence 
for the nonlinear autoregressive model, see Section 7.4 of Ghosal and van der 
Vaart [5] . Many authors have studied Bayesian convergence rates for the Gaus- 
sian white noise model with a conjugate Gaussian prior (or, equivalently, one 
has independent normally distributed observations as N{9i, 1/n), i = 1, 2, . . . 
and puts a Gaussian prior independently on 9i, i = 1,2, ...n), see for in- 
stance Ghosal and van der Vaart [5], Scricciolo [8], Shen and Wasserman [9] 
and Zhao [20j. Now by our general posterior convergence rate theorem, we 
extend their results to multi-normally distributed observations which may not 
be independent. 

The paper is organized as follows. In Section 2 we introduce a prior- 
dependent integration assumption and present several different types of general 
posterior convergence rate theorems. Section 3 contains applications of our 
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general results to independent observations and Markov chains. Section 4 con- 
tains concrete applications including nonlinear autoregression model, infinite- 
dimensional normal model and priors based on uniform distributions. The 
technical proofs are collected in Appendix. 

Throughout this paper the notation a < 6 means a < Cb for some positive 
constant C which is universal or fixed in the proof. Write a ~ 6 if a < 6 
and 6 < a. Denote Pf" = f°'dP which is the integral of the nonnegative 
function / with power a relative to the measure P on X. 



2 General Convergence rate theorems 

In this section we introduce a new prior-dependent integration condition to 
study consistency of posterior distributions. The integration condition is 
shown to be automatically fulfilled by a large number of metrics. Together 
with the Hausdorff a-entropy, this integration condition plays a central roll in 
our versions of general Bayesian convergence rate theorems. 

Let us begin with the following assumption given by Ghosal and van der 
Vaart [5], in which they instead equivalently used a constant multiple of the 
semimetric e^. 

Assumption 1. Let K be a positive constant. Assume that {dn} and {e„} 
are two sequences of semimetrics on Q such that for every n, e > and 6i £ Q 
with dniOijOo) > £, there exists a test (pn satisfying 

Pi"Vn<e-^"^' and inf pj" Vn > 1 - e"^"^' . 

Based on Assumption [U Ghosal and van der Vaart [5] established a series 
of general Bayesian convergence rate theorems. Assumption [T] does not depend 
on the prior distribution. Note that the posterior depends on the complexity 
of the model only through the prior. As far as the Bayesian approach is 
concerned, it would be interesting to find some conditions incorporating the 
prior in study of asymptotic properties. In the following we give such a prior- 
dependent condition. 

Recall that the Hausdorff a-entropy J{6,@i, a, Cn) for ©i C is the log- 
arithm of the minimal sum of a-th power of prior masses of balls of e^-radius 
< 6 needed to cover Gi, see Xing |17) and Xing and Ranneby [18] for the 
details of the Hausdorff a-entropy. For simplicity of notations, we define the 
Hausdorff a-constant C(5, 9i,a,e„) := e-^^*^'®!'"'^") of any subset 6i of ©. 
Observe that C((5, 0i, a, e^) depends on the prior H„. It was proved in Xing 
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and Ranneby [18j that the inequaUty 

n„(Gi)" < C{5, Qi,a, en) < n„(ei)" N{6, Gi, e„)i-" 

holds for any < a < 1, where N{6,Qi,en) denotes the minimal number of 
balls of Cn-radius < 6 needed to cover 0i C Q. Our prior-dependent integration 
condition is 

Assumption 2. Let {dn} and {e„} be two sequences of semimetrics on 0. 
For some a G (0, 1) there exist constants Ki > 0, K2 > and K3 > such 
that the inequality 

Pi:Hl <)(xW)n„(d0))" 

^ y 6*601 :d„(6l,0o)>e ^ 

< Ki e-^2"='C7(e, {0 G 01 : 9^) > e}, a, e„)-^« 

holds for any e > 0, Bi C © and for all n large enough. 

We usually take = 1 but here we let > in order to increase the 
scope of applicability. It was shown in Xing [17] that Assumption [2] holds 
when the observations are i.i.d. and rcn = dn = d for some constant r > 2 
and some metric d which is dominated by the Hellinger distance. The integral 
of Assumption [2] depends on the prior n„ and hence is trivially equal to zero 
when n„ puts zero mass outside of 6*0. So Assumption [2] cannot generally 
imply Assumption [TJ In fact, Assumption [2] is weaker than Assumption [1] as 
shown in the following. 

Proposition 1. AssumptionUl implies Assumption\Mfor allO < a <1, where 
one can choose Ki = 2, K2 = {1 — a) K f\aK and = 1. 

We shall use the Hellinger distance H{f,g) = ||\/7 — ^/g\\2 

and its modification H^:{f,g) = ||(vT — + D^^^lb, where 

\Mp = I The inequalities j^H{f,g) < H,{f,g) < 

\\f /g\\^^^ H{f, g) hold for all densities / and g, since llZ/ffH > 1- The 
quantity H^: was used by Xing [16j in computation of prior concentration 
rates. Denote 

Wn{eo,e) = {eee: H^ipf^J^^ ) < y^(el-^-i)}. 

Note that iy„(0O)£) contains the set {6* G © : H^{p^^^ ,p^^^ ) < ^/ne^ because 
of ne^ < |(e2"^ — 1). The following proposition shows that Assumption 
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[2] holds automatically when dn = = dn for some metrics d^^ such that 
d\{9, OiY is a convex function of and 

4(01, ^,)2<-^ log (1) 

for all n and 9i, 62 G Q, where s is a fixed positive constant. Throughout this 
paper we let stand for a metric with this property. 

Proposition 2. Let < 5 < 1/2 and < a < 1. Then the inequality 

^ ieeei:di(e,eo)>£ 

< 2e-5(i-")(i-25)^«-'C(5e,{0 e Gi : <(0,0o) > e},a,di) 
holds for all n, e > and ©i C G. 

Another advantage of adoption of Assumption [2] is that it enables us more 
easily to construct prior distributions n„ which may receive good posterior 
convergence rates. Here we present a result which implies that Assumption 
[2] with K3 = holds for data-dependent priors Iln{d6)/p^^\xi"'^)^~^ for any 
given constant < /? < 1. Data-dependent priors have been studied by 
Wasserman [15j . Walker and Hjort [13j and Xing and Ranneby |19] . 

Proposition 3. The inequality 

pi:u [ Ri^\x(^'>fu^{de) ' 

< e-((^-^)^^)""^'n„(0 G 61 : di{e,eo) > sT 

holds for all n, < a < I, < P < I, e > and 61 C 6. 

Now we are ready to represent our first main result of this paper. 

Theorem 1. Suppose that Assumption holds and that en > 0, ne^ > 
Co logn for all large n and some fixed constant cq > 0. Suppose that there 
exist a constant ci < K2 and a sequence of subsets 0„ on such that 

C{jen,{0 G e„ : jen < dn(0,0o) < 2je„}, a, e„)^3 < g^i-'""^' n„(H^„(0o, ^n))" 

(2) 

for all sufficiently large integers j and n. Then for each r large enough we 
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have that 

n„(0Ge„: d„(0,0o) >r-e„|x(")) ^0 
almost surely as n —> cxd. If furthermore there exists C2 > ^ such that 



og^ ^n4(3+2c,)n„(e\e„) 

then there exists a constant b > such that for each large r and all large n, 

Tln{0 G e : dn{e,eo) > re„|x(")) < e"^"^" almost surely 

which tends to zero as n ^ oo. 

Under Assumption [T] and e„ > n~"' with < 7 < 1/2, Ghosal and van 
der Vaart ([5], Theorem 2) proved an almost sure convergence rate theorem 
and obtained that Pj"^n„(e G e„ : dn(^,0o) > r^e^jX^'^)) = 0(4) for 
every r„ — > 00. The upper bound is slower than e"**"^" of Theorem [H and 
moreover Theorem [T] can be applied to obtain the posterior convergence at 
the rate e„ = y^log n/n. Note that when = the inequality (2) follows 
from I[n[Wn{9o,£n)) > e""^""^". So Theorem [T] gives that in the special case 
of K3 = the concentration rate is precisely equal to the convergence rate. 
We also mention that in the case that the set is convex and dn{0,6oy for 
some constant s > is a bounded convex function of in B, it turns out 
from Jensen's inequality that the posterior expectation On := / 9 dIln{9\X^^^) 
under the assumptions of Theorem [1] yields a point estimator of with the 
convergence rate at least e„. Together with Proposition [2l Theorem [T] implies 
the following direct consequence for the metric d^. 

Corollary 1. Suppose that Sn > 0, ne^ > cq logn for all large n and some 
fixed constant cq > 0. Suppose that there exist 0<a<l, 0<(^<l/2 and 
c\ < ^(1 - q)(1 - 26)^ such that 

C{6jen,{9 G e-.jsn < di{0,Oo) < 2ie„},a,4) < e^i^'"^" n„(M^„(0o, 

for all sufficiently large integers j and n. Then there exists a constant b > 
such that for each large r and all large n, 

n„(6' G G : di{9,9o)>ren\X^''^) < e"^"^" almost surely 

which tends to zero as n —> 00. 
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It is also worth pointing out that from Lemma 1 in Xing and Ranneby 
[18| it fohows that the inequahty (2) can be derived from the fohowing two 
inequaUties: 



and 

for some constants C3 and C4 with C3 + C4 < K2. Thus, we have the fohowing 
consequence. 

Corollary 2. Suppose that Assumption \^ holds and that £n > 0, ne'^ > 
Co logn for all large n and some fixed constant cq > 0. Suppose that there 
exist constants ci, C2, C3 with ci(l — a) + C2OL < K2 and C3 > I/cq and there 
exists a sequence of subsets 0„ on such that for all large j and n, 

(i) N{jen, {0 G On : jSn < dnie, Oo) < 2jen}, enf' < 6"=^^'-="; 

(ii) Un{e G e„ : jSn < dniOM < 2jenf' < e^2i'"4 Un{Wn{eo,en)); 

n„(w„(eo,en)) 

Then there exists a constant b > such that for each large r and all large n, 

Un{9 G e : dn{e,eo) > re„|X(")) < e"''"^" almost surely 
which tends to zero as n —> 00. 

Our next theorem gives another different version of Theorem [H 

Theorem 2. The following statements are true. 

(a) Theorem [I] holds if the inequality (2) is replaced by 

C(e„,e„,a,e„)^3 < e^i"=" n„(Ty„(0o, for ah large n. 

(b) Corollary holds if both (i) and (ii) are replaced by 
Nien,en,enf' < e'^'^'" and UniOnf' < e^^"^' n4H^„(0o,en)). 



7 



In order to deal with convergence rates of posterior distributions in the 
sense of in-probabihty, following Ghosal and van der Vaart 0, we adopt no- 
tations Vk{f,g) = f^(„) f\ log(//5)|^(i^(") and Vk,o{f,g) = J^(„) f\ log(//g) - 
K{f,g)\^ d^i^''\ where K{f,g) = f^^^^ / log(//c/) is the Kullback-Leibler 
divergence of densities / and g. Denote 

Our result in this direction is 

Theorem 3. Suppose that Assumption [H holds and that A; > 1, e„ > 0, 
ne^ > Co for all large n and some fixed constant cq > 0. Suppose that there 
exist a constant ci < K2 and a sequence of subsets 0„ on such that 

CijenAO e e„ : jen < dn(^,^o) < 2je„}, a, e„)^3 < e^^^'"^' n„(5„(0o, £«; fc)) 

(3) 

for all sufficiently large integers j and n. Then for each rn ^ 00 we have that 

e ■■ dn{0, 9o) > Tn Enl^^"^) ^ 

in probability as n ^ oo. If furthermore there exists C2 > 1 such that 

e=2"4 n„(e\e„) n ^f, 
— 7 — — — > as n —>■ oo, then 

n„(B„(eo,£„;A:)) 

in probability as n ^ oo. 

Similarly, Theorem [3] holds if one replaces the inequality (3) by 

C{en,en,a,enf' < e^i"^" n„(5„(0o,en;A;))" for large n. 

Moreover, as a consequence of Theorem [3] we obtain the following result which 
is a slightly stronger version of Theorem 1 in Ghosal and van der Vaart [5] . 

Corollary 3. Suppose that Assumption [H holds and that > 1, e„ > 0, 
ne^> cq for all large n and some fixed constant cq > 0. Suppose that there 
exist constants ci, C2 > with ci(l — a) + C2a < K2, C3 > 1 and a sequence 
of subsets Qn on such that for all large j and n, 

(i) N{jen,{0 G &n ■■ jsn < dn(e,^o) < 2jen},enf' < c'^^^'^^'; 

(ii) Un{9 G 0„ : jSn < dn{e,eo) < 2jenf' < e^2j'"^" n„(S„(eo,en; k)); 
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(in) 



e' 



n, 



in 



"4 nn(e\e„) 

(Bn(6»0,£n;A;)) 







as 



n 



oo. 



r/ten /or eac/i rj 



oo we have that 



in probability as n ^ oo. 

3 Some Special Cases 

In this section wc apply our general convergence rate theorems to i.n.i.d. ob- 
servations and Markov processes. For i.n.i.d. observations we establish almost 
sure convergence rate theorems both on pseudoposterior distributions and on 
posterior distributions. We derive an almost sure posterior convergence rate 
theorem for general Markov processes. 

3.1 Independent observations 

We consider the case that is a random vector {Xi,X2, ■ ■ ■ ,Xn) of in- 

dependent variables Xj, where each Xi is generated from some density pg^i 

relative to a a- finite measure Hi on {Xi,Ai), and that P^^^ is the product 
distribution with the density p0^\x^'^^) = YYi=iPd,ii^i) relative to the direct 
product measure //^"^ = /xi x /X2 x • • • x /x„ on X^"^ = x X2 x • • • x Xn- Assume 
that ^{61,62) = {^T.l:=iHi{p0^,i,p0^,ifY^'^ , where each Hi{p0^^i,pe^,i) = 

( / {\/P8i,i~ \/P02,i)'^ ^f^i) HcUinger diatance between pg^^j and pg,^^i rel- 

ative to fii on Xi. It is clear that df^ satisfies the triangle inequality and hence 



is a metric on G. Denote H^.i{P8i,i^Pe2,i) = {HVPe^i- ^/P&^f'^lJjT'i + 



^) dfi^i^ . An advantage of adoption of in computation of concentration 
rates for independent observations is that we have the following quality 





1=1 1=1 1=1 



which implies that Wn{Oo,e) contains the set 



1 " 

Wn{eo,e) := (0 G G : -J2H*,iiPeo,uP0,i? < ^'j- 



1=1 
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Similarly, we have 

i=l i=l i=l 

which implies that the metric satisfies the inequality (1) and hence by 
the convexity of (d^)^ one can apply Proposition [2] and Proposition [3] for d^. 
Now we are ready to present two results for i.n.i.d. observations by means of 
Wn{9o,e) and 

3.1.1 Pseudoposterior Convergence Rate. Given < /? < 1, we define a pseu- 
doposterior distribution n^,n based on the prior n„ by 

n 

Up,n{B \Xi,X2,...,Xn) = for each Bc@. 

JeUPeAXiy^nide) 

i=l 

n 

In other words, we use the data-dependent prior n„(d0)/ ]^ 

i=l 

Wasserman |15) first applied psuedolikelihood function-data-dependent priors 
in study of asymptotic inference for mixture models. The pseudoposterior lip^n 
for i.i.d. observations was introduced by Walker and Hjort [13] who proved 
a Hellinger consistency theorem when (3 = 1/2. The Hellinger consistency 
theorem for any < /? < 1 was obtained by Xing and Ranneby [19]. Here we 
study the convergence rates of the pseudoposteriors for i.n.i.d. observations. 
Using Proposition [3] for d^, we obtain 

Proposition 4. The inequality 

< g-((l-/3)A/3)ane2jj^^^ ^ . ^0 > 

holds for all n, < a < I, < P < I, e > and 61 C 9. 
Therefore, we have 
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Theorem 4. Let < (3 < 1. Suppose that en > 0, ne^ > cq logn for all large 
n and some fixed constant cq > 0. Suppose that there exists ci > such that 

for all large n. Then for each large r, 

%„(^Ge: dl{e,eQ)>ren\Xi,X2,...,Xn) ^0 
almost surely as n ^ oo. 

Since the total mass of Hn is always equal to one, Theorem H] implies 
that the convergence rate En of the pseudoposterior distribution n^,n can 
be completely determined by the concentration condition n„(l^„(0o, En)) > 

2 

gCirae„_ other words, the convergence rate does not depend on the rate of 
the metric entropy which describes how large the model is. 

3.1.2 Posterior Convergence Rate. By a result of Birge (see [6], page 491, or 
[5], Lemma 2) we know that there exist tests satisfying Assumption [TJ Based 
on this fact, Ghosal and van der Vaart ([5], Theorem 4) gave an in-probability 
convergence rate theorem for i.n.i.d. observations and the metric d^. Now, 
together with Proposition [2] and Wn{Oo,e) C Wn{Oo.,e), Theorem [T] implies 
the following almost sure assertion. 

Theorem 5. Let < 5 < 1/2 and < a < 1. Suppose that En > 0, ne'^ > 
Co logn for all large n and some fixed constant cq > 0. Suppose that there 
exist ci < ^(1 — a)(l — 26)"^ , C2 > ^ and a sequence of subsets @n on Q such 
that 

Ci6jen,{0 e e„ : je„ < d'^{e,9o) < 2ie4,a,d°) < e^^^'"^- n„(W„(0o, 
for all large j, n, and 

^^nel (3+2c2)n„(e\G„) 

> 1= r < OO. 

Iin{Wn{eo,en)) 

Then there exists 6 > such that for each large r and all large n, 

n„(e G G : dl{e,eo) > re„|X(")) < e"''"^" almost surely. 

For readers' convenience, we here copy a direct consequence of Theorem [5] 
for a = 1/2. 
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Corollary 4. Let < (5 < 1/2. Suppose that £„ > 0, ne^ > cq logn for all 

large n and some fixed constant cq > 0. Suppose that there exist ci, C2, C3 
with ci + C2 < — 2(5)^ and C3 > I/cq anc? a sequence of subsets @n on @ 
such that for all large j and n, 

(i) N{6jen, {9e@n- jsn < <fi{0, Oo) < 2j£„}, d^) < e'^i^'"^' ; 

{ii) Un{9 e Qn : jen < dfnie, 60) < 2jen) < e^2i'"4 u4Wn{9o, En)) ; 

^1 nn(Wn{0O,en)) 

Then there exists b > such that for each large r and all large n, 

nn{9 G G : dl{9,9o) > ren|J^^")) < e"^"^" almost surely. 
3.2 Mcirkov chains 

Let Xo,Xi,... be a Markov chain with transition density pg{y\x) and initial 
density qe{xQ) with respect to some c-finite measure on a measurable space 

Here we assume that for each G the 2-variable function 
pe{y\x) is measurable. So the joint distribution Pg""-* of Xq,Xi, . . . ,Xn has 

a density given by p^ {x^^^) = qg{xo) Yl P9{xi\xi^i) relative to the product 

i=l 

measure |u(xo)/u(a;i) . . . /x(a;„). We shall adopt the following Hellinger type 
semimetrics. 

H{pdi{y\x),pe2{y\x)) = ( / / {\jpeAy\x) - \fpeJy\x)Y dii{y)dv{x)^ ' , 

H{qei{x),q92{x)) = J^{^JqeAx) - ^JqeJ^Y dij,{x)j ^ , 
H*{p0^{y\x),pe^ {y\x)) 

{qe, (x) , qe^ (x)) = ( ^ (^gSi (a^) - -^^02 (a^) ) ^ + \) M^)) ' ■ 
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Denote 

By means of the metric d{6,9Q) := H{p0,pgg), Ghosal and van der Vaart 
([5], Theorem 5) gave an in-probabihty posterior convergence rate theorem 
for stationary a-mixing Markov chains. Since calculation of the a-mixing 
coefficients is generally not easy and many processes are neither mixing nor 
stationary, it seems worth to develop a posterior convergence rate theorem for 
Markov chains which may be neither stationary nor a-mixing. Now we have 
an almost sure assertion in this direction. Our result is based on the following 
proposition. 

Proposition 5. Suppose that there exist a fi-integrable function r{y) and 
constants ai > > with ai > 1 such that dv{y) = r{y)dfi{y) and 
(^Qfiy) < Pe{y\x) < air(y) for all 6 £ Q and x,y G X. Let < 6 < 
and < a < ^. Then the inequality 

p{n) / [ qejXo) yr pe{Xi\Xi^i) ^ 

^ Jeeei:d{e,eo)>e Wo 

iXo)l\pejX,\X,^,) ' 

< 2e-(5-«)(4^-v^'5)2n£2^(^g^|^ ^ . > £},a,d) 

holds for all n, e > and Bi C Q, where d{6, Oq) = H{pg,pQ^). 
Therefore we have 

Theorem 6. Suppose that all assumptions of Proposition\^hold and suppose 
that En > 0, nef^ > cq logn for all large n and some fixed constant cq > 0. 
Suppose that there exist ci < — a){^^ — i/ai(5)^, C2 > ^ and a sequence 
of subsets @n on G such that 

C{6jen,{0 e e„ : jsn < d{9,9o) < 2jen},a,d) < e^^^''^^" n„(Ty„i(0o, 

for all large j, n, and 

Then there exists b > such that for each large r and all large n, 

n„ G e : d{e, 6lo) > r e^l Xo, Xi , . . . , X„) < e"^"^" almost surely. 
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By choosing 6 = and a = | we can easily get 

Corollary 5. Suppose that there exist a fi-integrable function r{y) and con- 
stants ai > ao > such that dud/) = r{y)dfi{y) and aor{y) < peiy\x) < air{y) 
for all 9 ^ Q and G X. Suppose that Sn > 0, ne^ > cq logn for all large 
n and some fixed constant cq > 0. Suppose that there exist ci, C2, C3 with 
3ci + C2 < ao/16 and C3 > I/cq and a sequence of subsets 0„ on such that 
for all large j and n, 



(^) N{^j^n,{0 e e„ : jsn < d{e,eo) < 2jen},d) < e^^^ 

(^n) £ (3^4.3) n„(e\e„) ^ 
n„(Tyi(eo,en)) 

T/ien i/iere exists b > such that for each large r and all large n, 

n„(6l G e : d{e,eQ) > re„| Xo,Xi, . ..,Xn) < e"*"^" almost surely. 

4 Applications 

In this section we gives three examples of applications of our theorems. By 
means of Corollary [SJ we improve on the posterior rate of convergence for the 
nonlinear autoregressive model in Ghosal and van der Vaart [5]. Corollary 
[T]is applied to find the posterior convergence rate for an infinite-dimensional 
normal model, which extends the known results in Ghosal and van der Vaart 
[5], Scricciolo |8j, Shen and Wasserman [9] and Zhao [20] for the white noise 
model with a conjugate prior. Finally, we use Corollary [5] to study priors 
based on uniform distributions, which extends the corresponding result for 
priors based on discrete distributions in Ghosal and van der Vaart [5]. 

4.1. Nonlinear autoregression. We observe Xi, X2, ■ ■ ■ , Xn of a time series 
{Xf.te Z} given by 

Xi = f{Xi^i) + Si for i = l,2,...,n, 

where ei,e2, ■ ■ ■ ,£n are i.i.d. random variables with the standard normal distri- 
bution and the unknown regression function / is in the space J- which consists 
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of all functions / with sup|/(x)| < M for some fixed positive constant M. 

a;eR 

Let qf{x) be the density of Xq relative to the Lebesgue measure dfj, on R. So 
Xo,Xi, . . . can be considered as a Markov chain generated by the transition 
density = (f){jj — f{x)^ with ct){x) = (2Ti)~^l'^e~^ and the initial den- 

sity qf{x). Since (j){x) is a strictly positive continuous function tending to zero 
as X ^ ±00, there exist two constants < ao < 1 < ai depending only on 
M such that aQ(j){y) < pf{y\x) < ai4>{y) for all / G ^ and — cxd < y, x < co. 
Assume that there exists a constant > such that the set of initial densities 
of the Markov chain satisfies , ^7/2) ^ ^ for all initial densities g/^ and 

qf^. For instance, all of the initial densities with ao</'(x) < qf{x) < ai(p{x) 
satisfy H^{qf^,qf^) < \/2{ai/aoY/^ and hence form a set with the require- 

1 /2 

ment. Define a measure dv = (j)dfi in M and a norm II/II2 = (/kI/P^^) 
on J^. Assume that the true regression function Jq ^ T belongs to the Lip- 
schitz continuous space LipM, which consists of all functions / on (—00,00) 
satisfying — /(y)| < L\x — y\ for all —00 < x, y < 00, where L is a fixed 
positive constant. When the Markov chain is stationary, Ghosal and van der 
Vaart ([5j, Section 7.4) constructed a prior on the regression functions and 
obtained the in-probability posterior convergence rate n~^/^(logn)^''^, which 
is the minimax rate times the logarithmic factor (log n)^/^. In the following we 
shall apply Corollary [5] to get the posterior convergence rate n~^/'^(log n)^/^ 
in the almost sure sense for a general Markov chain defined as above. 
First, we note that for any f € !F, 

H*{Ph,Pff + ^H4qf^,qff < .f^H{pfo,pff + ^ 
n \ clq n 

_(JM^M^\ \\f-fo\\ 
I — e 4 I du^x) H < 




where the last inequality follows from the elementary inequality 1 — e * < t. 
Hence for some small constant 61 > we have that W^(/o,en) 3 {/ G : 
11/ - /0II2 < biEn] for all large n. Similarly, ||/ - /0II2 ~ H{pf,pf^) hold for 
all / E with 11/ — /0II2 ^ 1- Hence Corollary [5] works well for the metric 

We also need some basic facts on approximation of Lipschitz continu- 
ous functions by means of step functions. Given a finite interval [— ^„,A„) 
and a positive integer Kn, we make the partition [—An, An) = \Jk=i^k 
with h = [- An + + M^) for k = 1,2,. ..,Kn. Write 

Iq = M\ [—An, An). The space of step functions relative to the partition is the 
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set of functions h : [—An, An) i— M such that h is identically equal to some con- 
stant on each for A; = 1, 2, ... , Kn, more precisely, h{x) = X^^^ Pk ^hi^) 
for some f3 = {Pi, (32, . . . , I3kJ G [-M,M]^" C M^", where li^ix) denotes 
the indicator function of I^- Denote by fp{x) the function on (— oo, oo) which 
is equal to X^j^i Pk I4 (x) on [— An) and vanish outside [—An, An). Hence 

fpeJ^ and Wfp, - fp,\[2 = - /?2||*, where = (Ef=i PUL, du)^f\ 
Let n„ be the prior on which is induced by the map P ^ such 
that all the coordinates Pk of P are chosen to be i.i.d. random variables 
with the uniform distribution on [—M,M]. Hence the support J^n of H^ 
consists of all such functions fp. Take An = 2A/log(l/e„) ^ \/logn and 
if„ = + 1 with £„ = (Vip) V3_ ^j^g^ ^ (nlogn)V3 «i ^e^. Write 

/3o = {Po,uPo,2, ■ ■ ■,Po,kJ for po,k = /o(- A + Since /o G J^nLipL, 

we have that //3„ e and sup_^^<^<^^ | //3„(x) -/o(x) | < LAn/Kn < hen/S. 
From the triangle inequality and the inequality (p{t)dt < (p{x)/x for all 
x > 0, it follows that for all fp G and for all large n, 

I - /0II2 - - //3ol|2 I < II//30 - /0II2 = ( / 1/0 - //3ol' f^^^j 



^ 6l£n Men ^ 61 £n 

~ 3 (27r)V4^y2 - 2 ■ 
Thus for all large j and n, we have 

nn(//3 g -^n : Jgn < - /0II2 < 2j£n) ^ ^n{f)3 & ■ Wf/B - /0II2 < 2j£n) 

Un{W^{eo,Sn)) - Unifp G : [\fp - /o||2 < Mn) 

^ nn(//3 G J>t : - /0II2 < 3j£n) 

~ n„(/;3 : [\U-fpo\\2 < |en) 

^ H^/jg [-M,M]^" : ||/3-/3o||. <3j£„) 

n„(/3G [-M,M]^" : 11/3 -/3o| I* < |-en) ' 

Note that the Euclidean volume of the i^T^-dimensional ellipsoid {/? G M^" : 
11/5 ~ /^o 1 1* < is equal to r-^" times the Euclidean volume of the "unit" 
if„-dimensional ellipsoid {/3 G M^" : ||/3 — /3o||* < !}• So the last quotient 
doer not exceed j^^" = e^"'°^^^-^\ which is less than e'^^^ for any given 
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C2 > and all large j. Hence we have obtained condition (ii) of Corollary [5l 
Similarly, for all large j and n, we have 

N{^jen, {//3 G :Fn : je^ <\\fp- /0II2 < 2je„}, || • II2) 



< N{f^jen,{fp eTn-.m- //30II2 < 3jen}, 
4^ai 



<N{^jen,{P^[-M,Mf- : ||/? - /?o||* < Sje.}, || • ||.), 

which, by Lemma 4.1 in Pollard [7], is less than b^" = e^"^°^*^ for some 
constant 62 > 0, and therefore condition (i) of Corollary [5] holds for any given 
ci > 0. 

4.2. Infinite- dimensional normal model. We observe an infinite-dimensional 
random vector {Xi,X2, ■ ■ ■), where the random vector X*^") = {Xi, . . . , X„) 
for each n is normally distributed according to A^(0(„), with density 

Pg"^ (x^""^), 6(^n) = (^1) • • • )^n); and the covariance matrix is known and 



satisfies 



for all a = {ai, . . . ,an) G IR" and for all n. The parameter space B consists 

of all vectors 9 = {61,62,...) in with \ \6\\2 := {EZi^fY^^ < In this 
section we identify = . . . , dn) with . . . , 0, 0, . . . ) and hence the 
norm ||0(„)||2 makes sense. Let 7 be a positive constant. The true parameter 
^'o = (^0,1) ^0,2, • • • ) is assumed to satisfy 

00 
1=1 

In the special case that Xi,X2, . . . are independent random variables and each 
Xi is normally distributed with mean 6i and variance 1/n, the Bayesian es- 
timation problem on parameters 6 = {61,62, ■■ ■) has been studied by many 
authors including Cox [2], Freedman [3], Ghosal and van der Vaart [5], Scric- 
ciolo [8], Shen and Wasserman [9] and Zhao [20]. They showed that posteriors 
can attain the minimax rate n~"'^^'^'^^^\ Observe that every white noise model 
can be described as an infinite-dimensional normal model via an orthonormal 
basis. 
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Now we construct a prior such that the posterior attains the optimal rate 
of convergence in our framework. We put the prior on the parameter 9 = 
(6*1, ^2, • • • ) such that O^^^ = (6*1, . . . , 6k) is distributed as iV(0, Sfc) and that 
9k+i,0k+2, ■ ■ ■ are set to be zero, where k = [n^^^'^'^^^^ c\ with some positive 
constant c which is determined later and the covariance matrix is assumed 
to satisfy 

1=1 

for all a = {ai, . . . ,ak) S and for all such k. For instance, the last 
inequality holds if eigenvalues Ai < A2 < • • • < of positive definite matrices 
satisfy Aj < ki'^'^ for i = 1,2,..., A;, which for independent variables 
Xi,X2, ... is slightly weaker than the condition (7.8) given in Ghosal and van 
der Vaart [5j . In the following we shall apply Corollary [T] to show that the 
corresponding posterior converges at the rate e„ = n~'^/^'^"'~^^h 

Theorem 7. Assume that (a), (6) and (c) hold. Let k = [n^/^^'^'+^^cJ and 
En = n^"'^^'^'^^^^ Then there exist constants c > and r > such that 

Un{eeQ: \\e- 9o\\2 > re„|X(")) 

almost surely as n —> 00. 

Proof. For any ai = (6*1,1, 6*1,2, • • • , 6'i,n) and 02 = (6*2,1, 6*2,2, • • • , 6*2,™) we have 
H{p^^},p^:^f = 2-2 [ ^P^hx)J^)dx = 2- 

2 



j exp (^-^((a;-Qi)i;^J^(x-ai)^+(a;-a2)S(Jj(x-Q2)^)^<ix, 



(2^)-/VdetS(„) 
where x = (xi, X2, • • • , x„) and 

(x - ai)i;^J^(3; - aif + {x - a2)T.~^^{x - 02)'^ 
= 2x^~^^x^ - 2(qi + a2)S^Jja;'^ + QiS^J^af + a2S^J^a^ 

= 2(x-y-y)S(^^(x-^ ^) --(ai+a2)S(„)(ai+a2) +012(^^01+02X^^^02 

01 02, 1, 01 02.r , 1/ ^T 
= 2(a; - :r)^(n)(^ tt) + :^("i - a2)S( ^(oi - 02) 



2 ^^{n)V- 2 2 ^ ^ 2^^ -^^^(n)V 

01 02. 1, 01 02. T II ||2 

^)S(„)(a;-^ ^) +bin||ai -02II2 
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for some positive constant bi independent of qi, 02, where the last inequaUty 
follows from condition (a). Hence we get 

H{pl^^,pl^^f > 2-2e-T'"ll"i-'^2||i^ 

which implies that the norm 2~^6i||-||2 satisfies the inequality (1). So Corollary 
[T] can be applied for the metric 2~^6i|| • II2 and for constants a = 1/2 and 
S = 1/4. 

It follows from condition {b) that ||6l(fc) - 6*0112 = Yli=i{^i - ^Oif + 

where ^(^j) = {61, ... , 9k) and ^o,(fc) = (^0,11 • • • ) ^o,fc)- This implies that for each 
large j, 

Ci^j£n,{0{k) ■■ i^n < ^\\S{k) - Oo\\2 < 2je„}, ^> yll • II2) 

1 1 

< C(-ie„,{6'(fc) : ||6'(fc) -6*0,(^)112 < 3ie„},-,|| • II2), 

which by Lemma 1 in Xing and Ranneby [18j does not exceed 

^n{0{k) ■■ \\^{k)-Oo,{k)\\2 < 3jen)^N{^jen,{^ik) ■■ \\0(k)-0o,{k)\\2 < 3ie„}, IHb)^ 

<Un{e^k) ■■ \ \Oik) - 0O,(k)\\2 < SjSn)'^ bl 

< n„(^(,) : \\e^k) - Oo,(k)\\2 < ^jsn)-^ e^of-''^ 

for some constant 62 > 1 and all large j, n, where we have applied Lemma 4.1 
in Pollard [7]. It remains to prove that for large j and n, 

nn(^(fc) : ll^(fc) -^0,(fe)l|2 < SjEn) < 6^0 f^'' U^WniOo, Sn)) . 

By the proof of Lemma 1 in Xing [16] we have 

-, I 3 („) (™) \2 _ p / (n) / (n) _ 1 

( ~ ~ Go,{n))^'[n)(^ - So,{n)f + \ix - 6I(„) (x - 6l(„))^)dx. 

Write 

3 1 

-^(x - 6lo,(n))S-i^(2; - 6'o,(„))^ + -{x - 6'(„))S-J^(x - 6'(„))^ 
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-^{x - 6'o,(n))S(^^(x - 6'o,(n)) + -;^{Go,{n) - ^(n))S(^)X 
1 1 

- ^o,(n))S^)(a; - 6'o,(„)) + 2(^0,(71) - ^(n))S^„)(x - 6*0,(71)) 

+ ^(6*0,^ - V))^(n)(^0,(n) - 0{n)f 
-[{x - 6'o,(„))i;^-)(x - 0o,(n))^ - (^0,(n) " ^(n))^(n)(^ ~ ^0,(n))^ 



1 



+ ^(6*0,(71) - 6'(n))2(„^)(6'o,(„) - 6'(n))'^) + g(6'o,(n) " 6'(n))2(„^) (6'o,(n) " ^'(n))'^ 

13 1 3 13 

= -2(^~2^°'^"^"^2^^")^^H^^~2^°'('')'^2^(")^'^'^8^'^°'^"^~^^"^^^S^^^^ 
Hence we obtain 

It then follows from condition (a) that there exists a positive constant 63 not 
depending on n such that 

The constant c is now chosen so largely that 63||^(jfc) — ^o,(n)ll2 ^ ^3ll^(ifc) ~ 
6*0,(^)112 + 2"-^£n- Since the support n„ is {(6'i,6'2, . . . ) : 6'; = for Z > A; + 1}, 

we get 

n„(W-„(0O,£n)) > n„(0(jfc) : - ^0 (^^^)112 < (263)-'/'£n) 

and hence 

^n{0{k) '■ \ \(^(k) - 6*0,(^)112 < 3je„) ^ n„(6l(fc) : ||6'(fc) - 6'o,(fc)||2 < 3je„) 



< 



nn(W„(^0,£n)) ~ ^niOik) ■ ll^(fc) " ^(fc)l|2 < (263)-V2£„) 

^ ■/||e(fe)-eo,(fe)||2<3je„^^P ( - ¥{k)^k^0fk))dO{k) 

ii|e(fc)-eo,wl|2<(263)-i/2e„exp (- i^(;,)E-l^J^))de(fe) 

•^|e(fc)-6'0,(fc)||2<3j6n^^(fc) 
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max eyip (-6(f,)Tj,^6T,s] ^'^'^^ 



< eijj'"^" max exp (-0(^)^1^07,^.) 

for all large j and n. On the other hand, it turns out from condition (c) that 
there exists 64 > such that for any 6'(fc) = (6*1, ... , 0^) with ||^(fc) — ^o,{fc)l|2 < 
(263)~^/^en, we have 

k 

exp < exp {b,kY,Of^'^) 

1=1 

fc k 



< exp (254A; - 9o,t)^i^^ + 2b4kJ2^lf^) 

i=l i=l 

k 00 

< exp {2bik^^+^ Y^^Oi - eo,if + 264A: ^l/^) 



i=l 



for all large j and n, where the second inequality follows from the inequality 
(s + t)^ < 2s^ + 2t^ for all s, t E M. Therefore, we have proved the required 
inequality and the proof of Theorem [7| is complete. 

□ 



4-3. Prior based on uniform distributions. Assume, just as in Section 3.1, 
that (Xi,X2, . . . of independent variables Xi has a density Y\^=iPe,i{xi) 
relative to the product measure ^1 x ^2 x • • • x /"n on Xi x X2 x • • • x . We 
follow the notations of Section 3.1. By means of the componentwise Hellinger 
upper bracketing numbers for G, Ghosal and van der Vaart |5j have obtained 
an in-probability convergence rate theorem for priors based on discrete distri- 
butions. Their result can be extended to an almost sure assertion in terms 
of Theorem O In the following we give an almost sure result for priors based 
on uniform distributions, which gives us an opportunity to adopt the average 

Hellinger metric (i°(6'i,6'2) = ZliLi -f^i(P6ii,j,Pe2,i)^)^^^ instead of the com- 
ponentwise Hellinger upper bracketing numbers. This also extends a result for 
i.i.d. observations given by Xing ([16j, Section 3.2). 
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Let c > 1 and let dn be metrics on 0. Assume that Qc,n for n = 1, 2 . . . are 
subsets of e such that ^ Ya=i H^,i{Pei,i^Pe2,if ^ dn{0i,02)'^ for ah 61,02 G 
0c,n- By the definition of H^, i we have < \/3cdn on Qc,n- Note that 
can be taken as a constant multiple of d^ in the case that H^,^i{pQ^^i,pQ^^i) < 
Hi{pei,i,Pe2,i) for ^1' ^2 in © and i = 1, 2, . . . , n. Given > 0, we assume 
that {-Bi, . . . , Bk„} is a partition of Gc,n such that for each Bi there exists bi 
in with Bi C {0 £ Qc,n '■ dn{bi,0) < e„/2c}. Let n„ be a prior distribution 
supported on @c,n such that n„(i?j) = l/i^n for z = 1, 2, . . . , Corollary H] 
implies the following result. 

Theorem 8. Suppose that 9q E 0c^„ for all n and suppose that log i^„+log n = 
0{ne^) as n ^ 00. Then for each large r, 

Un{9ee: d'i{9,9o)>ren\Xi,X2,...,Xn) ^0 
almost surely as n —> cxd. 

Proof. Take 0„ = Qc,n for all n. Then condition (iii) of Corollary H] is trivially 
fulfilled. For 6 = l/(2^/3c2) we have that for any given ci > and all large j 
and n, 

N{5jen, {eeQn-. jSn < d'^{9 , 9o) < 2jen}, d°) < 0n, d°) 

<N{'^^,Q^,dn) <Kn<e'^^"^'", 

where the last inequality follows from log-ftr„ = O(ne^). This implies condi- 
tion (i) of Corollary [H To see condition (ii) , by 6*0 G 0c,n we can take 6^^ G 
such that dn{big,9o) < en/2c. Then, for all 9 G Bi^ we have 

1 " 

-Y,H,,^{p9o,^,Pe,if < C^dn{9o,9f < {dn{9o,bi,) + dn{h„9)f < 6^, 

" i=l 

which implies that Wni9o,en) contains the whole set Bi^ and hence 
Un{Wn{9o,en)) > Iln{Bi,) = 1/Kn > e-'^^i^n^ for any given C2 > and 
all large j and n. So we have verified condition (ii) and the proof of Theorem 
[8] is complete. □ 

Example (Nonparametric Poisson regression) Assume that U > L > are 
two given constants. We consider Poisson distributed independent random 
variables Xi, X2, ■ ■ ■ , Xn with parameters 9{zi), 9{z2), ...,9{zn), where 6 : 
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M — > [L, [/] is an unknown increasing link function and zi, Z2, ■ ■ ■ , Zn are one- 
dimensional covariates. The joint mass function of {Xi,X2, ■ ■ ■ ,Xn) is given 

by n Pe,i{xi) with pe,i{xi) = e"^(^') ' . For a, 6 G [L, U] we have 
i=i 

x={) V a;! 

E(e 202— e 252)^/2 e"~°-a^ 1\ 

<(a-6)2e 2 _ ^(^-j <(a-6)2, 

z=0 

where the first inequality follows from the inequality |e~2a2 — e~2 52| < 
\a — b\e~i'{U^ + xU^~^) for all a,b £ [L,U]. This implies that 
h Y17=i H*,iip9ui,Pe2,i)'^ ~ / (^'i-^'2)^cflPn for all link functions 61 and 62, where 
= SiLi "^^i denotes the empirical distribution of zi, Z2, ■ ■ ■ , Zn- So one 
can use the L2(P^)-matric to produce the partition {Bi, . . . , Bk„} of the space 
of link functions. By Theorem 2.7.5 of [IDJ we know that log-ftr„ < e~^. Let- 
ting = ne^ we obtain e„ = n~^/^, and hence by Theorem [8] the posterior 
based on uniform distributions converges almost surely at the rate e„ = n~^/^ 
with respect to the metric d^, which is the minimax rate for this model. The 
in-probability convergence rate n~^/^ for the posterior based on discrete dis- 
tributions has been obtained in Section 7.1.1 of Ghosal and van der Vaart 

m- 

It is worth pointing out that in this example the suprenorm \ \pei,i/pe2,i\\oo 
may not be finite. Therefore, the approach on determination of prior concen- 
tration rates by means of H(pQ^^i,pQ^^i) | |p6»i,i/Pe2,il loo in Ghosal, Ghosh and 
van der Vaart [1] fails to be applied in this case, but the modified Hellinger 
distance -ff*(p6'i,i)Pe2,«) works well. A similar argument holds even for the 
infinite-dimensional normal model. 

5 Appendix 

Proof of Proposition [7J Given 5 > 1 , by the definition of the Haus- 
dorff a-constant and Assumption [H there exist pairwise disjoint subsets 
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Bi,B2,...,Bn^ of Gi such that (1) uf^^S^ = {6 e &i : dn{e,0o) > e}; 
(2) each is contained in some ball of e„-radius not exceeding e; (3) 
Ef=in„(Bfc)° < 6C{e,{9 G Gi : dn{9,9o) > e},a,e„); (4) there exist test 
functions (pk such that Pg^^cpk < e~^"^^ and Pg"'^(pk > 1 — e"^"*^^ for all 9 in 
Bfe. Then by the inequality {x + y)" < a;" + y° for all x, y > 0, we get 



fe=l -^^k 
fe=l -^^k 



It turns out from Holder's inequality and Fubini's theorem that 

k=i -^^k 

ifc=l -^^fc fe=l 

< ,5e-(i-")^"^'C(£,{0 G Gi : dn{9,9o) > £},a,e„). 

To estimate L2, we deal with 1/2 < a < 1 and < a < 1/2 separately. In the 
case of 1/2 < a < 1 we have < {2a — l)/a; < 1 and by Holder's inequality, 

k=i -^^k 

J B}, 

a 2Q — 1 
^ <)(x("))n„(d^))^}" 



it=i 
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fc=i fc=i 

In the case of < a < 1/2 we have < (1 - </)fc)^"" < (1 - <Pk)" < 1 and hence 
by Holder's inequahty, 



AT, 

< 

fc=l 



A:=l •^^'^ k=l 

Thus for any < a < 1 we have obtained the required inequahty for Ki = 25 
and K2 = a K if < a < 1/2 and K2 = {I - a) K if 1/2 < a < 1. Finally, 
letting 5 \1, we conclude the proof of Proposition [H □ 

Proof of PropositionlM Take nonempty disjoint subsets Bj, j = 1,2, . . . , N , 
of e such that E^Lin(Sj)" < 2C{6e,{e e 61 : 4(0, 6*0) > e},a,di), 
U^L^Bj = {0 G Gi : d\{9,6Q) > e} and d^-diameters of all Bj do not exceed 
2 5 e. Then we have 



^i6ieei:dl(0,6'o)>e ^ 
N f N 



j=l ■^^J j=l 

< 2C{5e,{e G 91 : 4(0,0o) > e},a,d\) max <Y M^t^jj ". 
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where /^(X^")) = Un{Bj)-^ /g^. n„((i0) is the integral mean of the 

hkehhood p^\x^"'^) and hence is a density function. With a shght abuse of 
notation we also let Ij stand for the corresponding parameter of this integral 
means. Take 9j G Bj for each j. By Jensen's inequality for d\{-,9jy we have 
di{Ij,9j) < 26e and thus 6*0) > 6*0) - dl{Ij,ej) > (1 - 2.5) e. Take 

an nonnegative integer m with < 2*" < From Holder's inequality it 
turns out that for each j, 



2-a 



which, by repeating the above procedure m — 1 more times, does not exceed 



< g-2-'"(l-2<5)2an£2 ^ ^-1(1-0) (1-25)2 ne2 

which completes the proof of Proposition [2j □ 

Proof of Proposition\M Denote S = {0 G Qi : d\{9,9Q) > e}. Assume first 

< (3 < 1/2. By Holder's inequality and the inequality 1 — x < , we have 



2P)a 



s 
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S 



1 



'S 

which gives the required inequaUty when < /? < 1/2. If 1/2 < /3 < 1 we take 
P ~ 2-2/3 ^'^'^ 9 ~ 2/3-1 • then follows from Holder's inequality that 

The proof of Proposition [3] is complete. □ 

To prove Theorem [1] we need two simple lemmas. 
Lemma 1. Let e > and c > 0. Then the inequality 

holds for all n. 

Proof. Without loss of generality, we may assume that n„(VF„(0Oi^)) > 0. 
From Jensen's inequality and Chebyshev's inequality it follows that 



V Iin[Wn{Po,e)) Jw^{eo,e) J 
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JW„iOo,e) " gp 
=.2^3, 



e-='(2+-)n4VF„(0o,e)) 
where 



+ 2 



which imphes the required inequahty and the proof of Lemma [T] is complete. 

□ 

Lemma 2. Under Assumption\^ the inequality 
^ Je€ei:d„{e,eo)>re 

oo 
j=[r-l] 

holds for all r > 2, e > 0, Qi C Q and for all n large enough. 

Proof Note that {9 G Bi : dn{9,9o) > re} C {9 G Bi : dn{e,eo) > [r]e} = 
^fL[r-i]{^ e ©1 : < dn{9,eo) < 2je} := U'^^^.^^Oij. Using the inequahty 
(x + y)" < + for all x,y > and Assumption [2] for 0i = Qij we obtain 

R(:)f [ <)(xW)n„(d0) 



6»e0i:dn(e,6lo)>re 



oo „ 
j=[r-l] 
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<Ki Yl e-^^"^' ' Cije, {9 G : dn{9, 60) > je}, a, enf' 

j=[r-l] 
00 

= Ki e-'''^^'''c{je,{9 G 81 : je < d„(0,0o) < 2ie}, a, e„)^3. 

j=[r-l] 

The proof of Lemma [2] is complete. □ 

Proof of TheoremUl Take a constant c > I/cq. Then e""^"'^ < e~'^'^oiogn _ 
l/n^"^" and hence Xirjii e""'^"'^ < 00. By Lemma[T]and the first Borel-CanteUi 
lemma, we get that for almost all X^^^ the inequality 

Je 

holds for all large n. Thus, for any 5 > we have 

= p(;)(r"n„(0 G e„ : d„(0,0o) > r£„|x("))" > i) 

^ Jeee„:d„{e,eo)>re„ 
which, by Lemma [2] and the inequality (2), does not exceed 



j=[r-l] 



00 

2 



i=[r-l] 

_^_^g(ci-i<'2)[r-l]ne2+a(3+2c)ne2 j^^^(^ci-K2)[r-l]co+a{3+2c)cQ 
§a(^l _ g(ci-_fs'2)n4) — ga(^l _ ^(ci~ii'2)co) 

where the next last inequality holds for all large r and the last inequality holds 
for all large n. Since the last exponent is strictly less than —1 for all large r, 
by the first Borel-Cantelli lemma we obtain that for almost all X^"'\ 

n„(eGe„: d^i9,9o)>re^\X^^^) <6 
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if n is large enough, which yields the first assertion. 

To get the second assertion, choose a positive constant b with C2 — | > ^. 

We then follow the above proof, but take c = C2 — | and 6 = e~^"^" instead, 
and note that 

,(n _ 

where by Lemma [1] the second term on the right hand side is dominated by 

2pbnel„nel(3+2c2-b) , . f t \ , ^ 

n„(W„(6'o, ° Jeee\e„:d„{e,eo)>re„ 



Then, using the same argument as the above, one can easily prove the second 
assertion and the proof of Theorem [1] is complete. □ 

Using the trivial inequality C(fen, 0„, a, en) < C{en, Qn, a, en) for 6 > 1, 
one can similarly prove Theorem [2j The proof of Theorem [3] is only a slight 
modification of the proof of Theorem [1] except that we need to apply Lemma 
10 in Ghosal and van der Vaart [5]. The proof of Theorem d] is completely 
similar to the proof of Theorem [H but instead of an application of Lemma [U 
one needs the following Lemma. 

Lemma 3. For independent observations {Xi, X2, ■ ■ ■ , Xn) we have that the 
inequality 

holds for all n, e > 0, c> and < (3 < 1. 

Proof of Lemma 0. Similar to the proof of Lemma [1] one can get that 



30 



e-^'(|+^)n„,(W„(eo,e)) 



^ JWn{9o,£j ' ^ -ne'e 

- =.2/3, . ^ K 



^2 , 



e"^'(2+-)n4T^„(eo,e)) 
which concludes the proof. □ 

Proof of Proposition It is no restriction to assume that n = 2A: is an even 
number. Similar to the proof of Proposition [2] we get that the left side of the 
required inequality does not exceed 2C{5 e, {9 £ @i : d{9, 9q) > e}, a, d) times 

2k 

, , r qe{Xo)WPe{X,\Xi_^) 

j{n) i / i=l 



" qi,Ma)X\P«,.(Xi\X,.i) 

i=l 

= max Pn — ; ; — — z — - — ; 



max Jjj, ■^"v"'^/""v""/ yr ij,2t-i yr 

i<j<N ^0 qeo{Xo)Iln{Bj) l\peo{X2t\X2t-i)J \ to Peo{X2t+i\X2t) 

( , ( I,^qeiXo)U4d9) ^ ^ -^ ^ 



max 



i<j<^ \^ V QeoiXoWBj) fJ^pe,iX2t\X2t-i] 



t=0 

where the last inequality follows from Holder's inequality, the set Bj is defined 
in a similar way as that of Proposition [2] and we have used the notations 
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n pe{Xi\Xi_,) = 1 and 

i=l 

s+l 

Ib <ioi^o) n pe{Xi\Xi_i) Unide) 

Sb- 9e(^o) n Pe{Xi\Xi_^) n,(d0) 
^ i=l 

for s = 0, 1, . . . , 2fc — 1. We also let Ij^g stand for the parameter of the corre- 
sponding integral means. Take 6j € Bj for each j. From Jensen's inequality 
and the assumption aor{Xs) < pe{Xs\Xs-i) < air{Xs) it turns out that 

d{ij,s,ej? = j^j^ {\fhs - ,Jpe,{Xs+i\Xs)f dfi{x,+i)du{x,) 

< [ [ [ {VPe{Xs+i\X,) - ^pe.iX,+i\X,))Uti{X,+i) 



-'J 

s 



qe{Xo)llp0{Xi\Xi^i) 

^ du{x,)Un{de) 



Js.qe{Xo)IlPe{Xi\Xi_i)nn{de) 

^ i=l 



<^ [ [ [ - Jp0,{Xs+i\Xs)f d^i{Xs+l)du{x,) 

qe{Xo)U Pe{Xi\Xi_,) 2 
— Ii.n{d9) < 



Ib IdiXo) n PeiX,\Xi_i) UnidO) 
^ 1=1 

s-l 

qe{Xo) n PeiXi\Xi_i) 

^ d{e,e,r ^ u^{de)<'-^ 

-""^ qe{X^) n Pe{X,\Xi_^) n^idO) "° 

^ i=l 



Thus, „ 9^) < ^ and d{Ij,s, ^o) > d{ej, 60) - d{Ij,s, Oj) > (1 - '-^)s. 
Write 

J X^''-^ \ J X\J X P9o{X2k\X2k-l) ) ] 
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qe,{Xo)UniB,) 11 pe,iX2t\X2t-i) 

2k~3 

e,{Xo) n Peo{Xs+i\Xs) dfi{Xo)dfi{Xi) . . . dfi{X2k-2). 



s=0 

Take an nonnegative integer m with < 2™' < j^^- Repeating the proof 
of Proposition [2] (applying the same procedure m + 1 times instead of m times) 
we get that 

Ij,2k-l ^2a ^^^j^^^A PeoiX2k-l\X2k~2) dn{X2k-l) 




x\Jx Peo{X2k\X2k-i) 

< \|pOo{X2k\X2k-l)fd^,{X2k) 

peg{X2k~i\X2k-2) dfj,{X2k-i) < 
'i^-^ j^J^{V^j,'ik-i-\/peoiX2k\X2k-i)ypeoiX2k-i\X2k-2)di^iX2k)dii{X2k-i] 

- (l - ? XX ^V^^^^ " \/pdo(^2k\X2k~i) f diJi{X2k)du{X2k-i)^ ' 



Hence we have 



7x2.-1 V TOn„(i?j) l}^pe,{X2t\X2t-i)) 

2k-3 

qe„{Xo) Y\_ Peo{Xs+i\Xs) dn{Xo)dn{Xi) . . . d^i{X2k-2)- 
Repeating the same argument k — 1 times one can get that 

Jx V Q9o[Xo)^i-n(Bj) J 

^ ^.^2.2/ f jB <leiXo)Un{de) \2« 
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Similarly, we have 

Hence we have proved the required inequality and the proof of Proposition [5] 
is complete. □ 

The proof of Theorem [6] is completely similar to that of Theorem [1] except 
that we apply Proposition [5] and the following lemma. 

Lemma 4. // there exists a constant ai > 1 such that J^pgf^{y\x)diJ,{y) < 
ai dv{y) for all x and A £ A, then the inequality 



holds for all n, e > and c > 0. 

Proof of Lemma ^ Similar to the proof of Lemma [1] we have that the left 
hand side of the required inequality does not exceed 



^ 1=1 ' 

So it suffices to prove that f ft P£<i™zi))^ < e^"^' for aU 

^ 1=1 ' 

6 S W^{Oq, e). We assume without loss of generality that n is an even number, 
say n = 2k. Write 

Qeoi^o) TT PeoiXjlXj^i) _ ggoC^o) yr P6io(^2j|^2j-i) yr Pep (^2j-i 1^2^-2) 
qg{Xo) y peiXi\X,^i) ~ qeiXo) pe{X2j\X2j-i) Pe(X2,--i|X2,_2) ' 

From Holder's inequality it then turns out that 

pin) ( QeoiXp) yr pgp (X^ ) \ 3 

^0 V qg{Xo) l\ pe{Xi\Xi^i) J 

< (pin)( (Xq) -A- P0O {X2j I X2j „ 1 ) \ I \ ^ f In) ( A PBq {X2j-l\X2j-2) \ l\ ^ 
"V '° We(Xo) Pe(X2,|X2,-l) M '° Vii^,e(X2,_l|X2,_2)v' y 



1 
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:= AkBk- 

Hence by Fubini's theorem we get that Af, is equal to 



dfi{Xo)dfi{Xi) . . . dn{X2k) 



I 




qe{Xo)2 -^-^ pe{X2j\X2j-i)2 

where by the proof of Lemma 1 in Xing [16j we have 

f n Pe^AX.u\X2u-i)i ^^^^^ peM2k^i\X2,^2) rfM^2.-i) 
JxKJx pe{X2k\X2k-i)^ / 

= / {^ + ^H^\p0^{-\X2k-i),Pe{-\X2k-i))^^Peo{X2k-i\X2k-2)dii{X2k-i) 

= 1+1 ^H^{peQ{-\X2k-i),Pe{-\X2k-i))'^ Peo{X2k-i\X2k-2)dp{X2k-i) 
Jx ^ 

<l+ I ^H,{pe,{-\X2k-i),Pe{-\X2k-i))^di^{X2k~i) 



Thus, we have obtained that Ak < e 4^ ^•(psq'P*) Ak^i- Repeating the same 
argument k — 1 times and using ai > 1 one can get 

Ak < e^'^'^P^o^^o^' ( [ ^^^l^dpiXo) 
^Jx qeiXo)^ 

Similarly, we can get that Bk < e^'^^'^Poo'Po^^ . Therefore AkBk < 
^lm{q,^,qef+^-^nH,{pe^,Pef < ^^ne' q ^ W„H6'o,e), and the proof of 

Lemma [His complete. 

□ 
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